Speech Recognition

The Best 1871 Speech Recognition Tools in 2025

Voice Activity Detection

Voice activity detection model based on pyannote.audio 2.1, used to identify speech activity segments in audio

Speech Recognition

Wav2vec2 Large Xlsr 53 Portuguese

This is a fine-tuned XLSR-53 large model for Portuguese speech recognition tasks, trained on the Common Voice 6.1 dataset, supporting Portuguese speech-to-text conversion.

Speech Recognition Other

Whisper Large V3

Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.

Speech Recognition Supports Multiple Languages

Whisper Large V3 Turbo

Whisper is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI, trained on over 5 million hours of labeled data, demonstrating strong generalization capabilities in zero-shot settings.

Speech Recognition

Transformers Supports Multiple Languages

Wav2vec2 Large Xlsr 53 Russian

A Russian speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Chinese Zh Cn

A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.

Speech Recognition Chinese

Wav2vec2 Large Xlsr 53 Dutch

A Dutch speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained on the Common Voice and CSS10 datasets, supporting 16kHz audio input.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Japanese

Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input

Speech Recognition Japanese

Mms 300m 1130 Forced Aligner

A text-to-audio forced alignment tool based on Hugging Face pre-trained models, supporting multiple languages with high memory efficiency

Speech Recognition

Transformers Supports Multiple Languages

Wav2vec2 Large Xlsr 53 Arabic

Arabic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on Common Voice and Arabic speech corpus

Speech Recognition Arabic

Wav2vec2 Base 960h

The Wav2Vec2 base model developed by Facebook, pre-trained and fine-tuned on 960 hours of LibriSpeech audio for English automatic speech recognition tasks.

Speech Recognition

Transformers English

Wav2vec2 Large Xlsr Korean

Korean Automatic Speech Recognition (ASR) model based on Wav2Vec2 XLSR architecture, excelling on the Zeroth Korean dataset

Speech Recognition

Transformers Korean

Wav2vec2 Large Xlsr Hindi

A Hindi automatic speech recognition model fine-tuned on low-resource Indian language datasets based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Transformers Other

Wav2vec2 Xls R 300m Ftspeech

A Danish automatic speech recognition model fine-tuned on Danish parliamentary speech dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers Other

Wav2vec2 Xls R 300m Hebrew

This is a Hebrew automatic speech recognition model fine-tuned based on the facebook/wav2vec2-xls-r-300m model, optimized for performance through two-stage training on small-scale and large-scale datasets.

Speech Recognition

Transformers Other

Filipino Wav2vec2 L Xls R 300m Official

A speech recognition model fine-tuned on Filipino speech datasets based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Faster Whisper Base

This is the CTranslate2 converted version of OpenAI's Whisper base model, designed for efficient speech recognition tasks.

Speech Recognition Supports Multiple Languages

Faster Whisper Large V2

Whisper large-v2 is a large-scale automatic speech recognition (ASR) model developed by OpenAI, supporting multilingual speech-to-text tasks.

Speech Recognition Supports Multiple Languages

Faster Whisper Tiny

CTranslate2 converted version of OpenAI Whisper tiny model for efficient speech recognition

Speech Recognition Supports Multiple Languages

Hubert Large Ls960 Ft

HuBERT-Large is a self-supervised speech representation learning model fine-tuned on 960 hours of LibriSpeech data for automatic speech recognition tasks.

Speech Recognition

Transformers English

Faster Whisper Large V3

Whisper large-v3 is a large-scale multilingual automatic speech recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.

Speech Recognition Supports Multiple Languages

Wav2vec2 Xls R 300m Cv7 Turkish

Automatic speech recognition model fine-tuned for Turkish based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers Other

Wavlm Base Plus

WavLM is a large-scale self-supervised pretrained speech model developed by Microsoft, pretrained on 16kHz sampled speech audio, suitable for various speech processing tasks.

Speech Recognition

Transformers English

Wav2vec2 Xls R 1b Portuguese

This is a Portuguese automatic speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple Portuguese speech datasets.

Speech Recognition

Transformers Other

Whisper is a pre-trained automatic speech recognition (ASR) and speech translation model, trained on 680k hours of labeled data with strong generalization capabilities.

Speech Recognition Supports Multiple Languages

A speech encoder based on the Conformer architecture, pretrained on 4.5 million hours of unlabeled audio data, supporting over 143 languages

Speech Recognition

Transformers Supports Multiple Languages

Distil Large V3

Distil-Whisper is a knowledge-distilled version of Whisper large-v3, focusing on English automatic speech recognition, offering faster inference speeds while maintaining accuracy close to the original model.

Speech Recognition English

Wav2vec2 Large Xlsr 53 Polish

XLSR-53 large model speech recognition system optimized for Polish, fine-tuned based on facebook/wav2vec2-large-xlsr-53, supports Polish automatic speech recognition

Speech Recognition Other

Hubert Base Ls960

HuBERT is a self-supervised speech representation learning model that learns speech features through BERT-like prediction loss, suitable for tasks such as speech recognition.

Speech Recognition

Transformers English

WavLM is a large-scale self-supervised speech pre-training model developed by Microsoft, supporting full-stack speech processing tasks and excelling in the SUPERB benchmark.

Speech Recognition

Transformers English

Faster Whisper Small

CTranslate2 converted version of OpenAI Whisper small model for efficient speech recognition

Speech Recognition Supports Multiple Languages

Faster Whisper Base.en

This is a Whisper base.en model converted based on CTranslate2, used for English speech recognition tasks.

Speech Recognition English

Wav2vec2 Large Robust Ft Libritts Voxpopuli

A speech recognition model based on wav2vec2-large, specifically designed to generate transcribed text with punctuation, suitable for TTS model construction.

Speech Recognition

Whisper Tiny is an automatic speech recognition (ASR) model developed by OpenAI, the smallest version in the Whisper series with 39M parameters.

Speech Recognition Supports Multiple Languages

Wav2vec2 Xlsr 53 Espeak Cv Ft

This model is a multilingual phoneme recognition model fine-tuned on the CommonVoice dataset based on the wav2vec2-large-xlsr-53 pre-trained model, supporting the recognition of phoneme labels in multiple languages.

Speech Recognition

Whisperkit Coreml

WhisperKit is a local speech recognition framework optimized for Apple Silicon, supporting efficient automatic speech recognition tasks.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Persian

XLSR-53 large model speech recognition system optimized for Persian, fine-tuned based on facebook/wav2vec2-large-xlsr-53 architecture

Speech Recognition Other

Faster Whisper Large V3 Turbo Ct2

This is a version of the Whisper large-v3 turbo model converted to the CTranslate2 format for efficient automatic speech recognition tasks.

Speech Recognition Supports Multiple Languages

Wav2vec2 Large Xlsr 53 English

An English speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained on the Common Voice 6.1 dataset

Speech Recognition English

Wav2vec2 Xls R 300m Cs 250

This is an automatic speech recognition model fine-tuned on Czech datasets based on facebook/wav2vec2-xls-r-300m, supporting 16kHz sampled audio input.

Speech Recognition

Transformers Other

Parakeet Tdt 0.6b V2

An automatic speech recognition model with 600 million parameters, supporting English transcription, punctuation, capitalization, and timestamp prediction

Speech Recognition English

Ukrainian automatic speech recognition model based on facebook/wav2vec2-xls-r-300m, trained on the Common Voice 10.0 dataset

Speech Recognition

Transformers Other

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase